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Abstract 

Many techniques for handling missing data have been proposed in the literature. 
Most of these techniques are overly complex. This paper explores an imputation 
technique based on rough set computations. In this paper, characteristic relations 
are introduced to describe incompletely specified decision tables. It is shown that the 
basic rough set idea of lower and upper approximations for incompletely specified 
decision tables may be defined in a variety of different ways. Empirical results ob- 
tained using real data are given and they provide a valuable and promising insight 
to the problem of missing data. Missing data were predicted with an accuracy of 
up to 99%. 

Key words: Indiscernibility, membership, missing data, rough sets, set approxi- 
mation 
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1 Introduction 



There are three genera l 



data (ILittle and Rubin 



ways that have been used to deal with the problem of missing 



19871 ). The simplest method is known as 'listwise deletion' which, 



simply deletes instances with missing values. The major disadvantage of this method is 



Kim and Curry! (119971 ) found that when 



the dramatic loss of information in data sets. 
2% of the features are missing and the complete observation is deleted, up to 18 percent 
of the total data may be lost. The second common technique imputes the data by finding 
estimate of the values and missing entries are replaced with these estimates. Various 
estimates have been used and these estimates include zeros, means and other statistical 
calculations. These estimations are then used as if they were the observed values. Another 
common technique assumes some models for the prediction of the missing values and uses 
the maximum likelihood approach to estimate the missing values. 

A graet deal of research h as been conducted to find ne w wa ys of approximating the 



missin g values. Among others, 



Abdella and Marwala 



(}2006|) and 



Mohamed and Marwala 



( 120051 ) have used ne ural networks tog ether with Genetic Algorithms (GA) to approxi- 



mate missing data. 



Qiao et al. 



(120051 ) have used neural networks and Particle Swarm 



Optimization ( PSO) to keep track of t he dy namics o f a po wer plant in the presence of 
missing data. 



Gabrysl (120021 ) have used Ne uro fuzzy for 



Wang! (120051 ) 



Nauck and Krusd (119991 ) and 
learning in the presence of missing data. A different approach was taken by 
who replaced incomplete patterns with fuzzy patterns. The patterns without missing 
values are, along with fuzzy patterns, used to train the neural network. In his model, the 
neural network learns to classify without actually predicting the missing data. Special 
attention in the literature has been given to imputation techniques such as the Expec- 
tation maximisation as well as the use of neural networks, coupled with an optimisation 
technique such as genetic algorithms. The use of neural networks comes with a greater 
cost in terms of computation and in that data has to be made available before the miss- 
ing condition occurs. This paper proposes a new algorithm based on rough set theory 
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for missing dat a estimation. Alt 



lough other simmillar method s have been mentioned in 



the literature (INakata and Sakai . 



2006 



Grzymala- Busse 



20041 ). this paper also applies a 



rough set technique for missing data imputation to a large and real database for the first 
time. It is envisaged in this work that in large databases, it is more likely that the missing 
values could be correlated to some other variables observed somewhere in the same data. 
Instead of approximating missing data, it might therefore be cheaper to spot similarities 
between the observed data instances and those that contain missing attributes. 



2 Applications of Rough Sets 



There are many applications of rough sets re ported in literature. M ost of the applica- 
tions assume that complete data is available (jGrzymala-Bussd . 12004 ). This is, however, 
not often the case in real life situations. There is also a great deal of information re- 
garding various applications of rough sets in medica l data sets. Rough sets have been 



used mostly in prediction cases and iRowland et al. 



(119981 ) compared neural networks 



and rough sets for the prediction of ambulation following a spinal cord injury. Although 
rough sets performed slightly lower than neural networks, they proved that they can 
still be used in predic tion problems. Rou gh sets have also been used in learning Mal i- 



cious Code Detection (jZhang et al. 



Grzymala- Busse and Hu 



2003|). 



20061 ) and in Fault diagnosis (ITay and Shenl . 
(1200 ll ) have presented nine approaches of imputing up miss- 
ing values. Among others, the presented methods include selecting the most common 
attribute, concept most common attribute, assigning all possible values related to the 
current concept, deleting cases with missing values, treating missing values as special val- 
ues and imputing for missing values using other techniques such as neural networks, and 
maximum likelihoods approaches. Some of the techniques proposed come with expense 
either in terms of computation time or loss of information. 
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3 Rough Set Theory 



he rough sets theory provides a technique of reasoning from vague and imprecise data 



(IGoh and Law 



20031 ). The technique is based on the assumption that some infor- 



mation is associated somehow with some infor mation of the universe of the discourse 



(IKomorowski et al. 



1999 



Yang and John 



20061 ). Objects with the same information are 



indiscernible in the view of the available information. An elementary set consisting of 
indiscernible objects forms a basic granule of knowledge. A union of elementary set is 
referred to as a crisp set, otherwise the set is considered to be rough. The next few 
subsections briefly introduce concepts that are common to rough set theory. 



3.1 Information System 

An information system (A), is defined as a pair (U, A) where U is a finite set of objects 
called the universe and A is a non-empty finite set of attributes as shown in EqQ] below 



( lYang and Johnl . 



200J). 



A=(U,A) 



(1) 



Every attribute a e A has a value which must be a member of a value set V a of the 
attribute a. 



a : U 



(2) 



A rough set is defined with a set of attributes and the indiscernibility relation between 
them. Indiscernibility is discussed next. 



3.2 Indiscernibility Relation 



Indis cernibility relation is one of the fundamental i deas of rough set theo ry (iGrzymala-Busse and Siddhaye 



20041 ). Indiscernibility simply implies similarity (IGoh and Lawl . 



20031 ). Given an infor- 
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mation system A and subset BCA,S determines a binary relation 1(B) on U: 



(x,y)El(B) iff a(x)=a(y) (3) 

for all a G B where a(x) denotes the value of attribute a for element x. Eq ([3]) implies 
that any two elements that belong to 1(B) should be identical from the point of view of 
a. Suppose U has a finite set of N objects {x\, x 2 , ■ ■ ■ , x^}- Let Q be a finite set of n 
attributes {qi, q 2 , ■ ■ ■ , q n } in the same information system A, then, 

A = (V,Q,V,f) (4) 

where / is the total decision function called the information function. From the definition 
of Indiscernibility Relation given in this section, any two objects have a similarity relation 
to attribute a if they have the same attribute values everywhere except for the missing 
values. 



3.3 Information Table and Data Representation 



An Information Table (IT) is used in rough sets theory as a way of representing the 
data. The data in the IT are arranged based on their condition attributes and decision 
attribute (V). Condition attributes and d ecision attribu t e are analogous to the inde- 



20031 ). These attributes are 



pendent variables and dependent variable (IGoh and Law , 
divided into C U T> = Q and C fl T> = 0. An IT can be classified into complete and 
incomplete classes. All objects in a complete class have known attribute values whereas 
an IT is considered incomplete if at least one attribute variable has a missing value. An 
example of an incomplete IT is given in Table [U 

Data is represented by a table where each row represents an instance, sometimes 
referred to as an object. Every column represents an attribute which can be a measurec 



varia ble. This kind of a table is also referred to as Information System ( jKomorowski et al 
1999k 
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Table 1: An example of an Information Table with missing values 







x 2 


x 3 


V 


1 


1 


1 


0.2 


B 


2 


1 


2 


0.3 


A 


3 





1 


0.3 


B 


4 


? 


? 


0.3 


A 


5 





3 


0.4 


A 


6 





2 


0.2 


B 


7 


1 


4 


? 


A 



3.4 Decision Rules Induction 



Rough sets also involve generating decision rules f or a given IT. The ru les are normally 



20031 ). The rules are 



determined based on condition attributes values (IGoh and Law , 
presented in an if CONDITION(S)-i/ien DECISION format. This paper will not directly 
focus on rule induction since the major interest of this work is to estimate the missing 
data as opposed to taking the decision. 



3.5 Set Approximation 



Ther e are various properties of rough sets that have been presented in ((PawlakJ, 



and (IPawlak 



19911 ) 



20021 ). Some of the properties are discussed below. 



3.5.1 Lower and Upper Approximation of Sets 

The lower and upper approximations are defined on the basis of indiscernibility relation 
discussed above. The lower approximation is defined as the collection of cases whose 
equivalent classes are contained in the cases that need to be approximated whereas the 
upper approximation is defined as the c ollection of clas s es tha t are partially contained in 



the set that needs to be approximated ([Rowland et al. 



19981 ). 



Let concept X be defined as a set of all cases defined by a specific value of the deci- 
sion. Any finite union of elementary s et, associated with B is called a B — definable set 



( iGrzymala-Busse and Siddhaye 



20041 ). The set X is approximated by two B — definable 



6 



sets, referred to as the B-lower approximation denoted by BX and B-upper approxi- 
matio n, BX. The B-lower approximation is defined as (IGrzymala-Busse and Siddhayd . 
20M) 



{xe\J\[x] B cx} 



(5) 



and the B-upper approximation is defined as 



{x e U\[x) B nx^®} 



(6) 



There are other methods that have been reported in the literature for defining the 
lower and upper approximations for a completely specified decision tables. Some of the 
common ones include approximating the lower and uppe r appr oximation of X using 



Equations [7] and M respectively as follows (I Grzymala- Bussd . 



200J): 



u{[x] B \x e u, [x) B a} 



(7) 



u{[x) B \xeU,[x) B nx^®} (8) 

The definition of definability is modified in cases of incompletely specified tables. In 
this case, any finite union of characteristics sets of B is ca lled a B — de finable set. Three 



differe nt definitions of approximations have been discussed IGrzymala-Busse and Siddhaye 



(120041 ) . Again letting B be a subset of A of all attributes and R(B) be the characteristic 
relation of the incomplete decision table with characteristic sets K(x), where x G U, the 
following are defined: 



BX = {xe U\K B (x) C X} (9) 

and 



7 



BX = {xe U\K B (x) n X ^ 0} 



(10) 



Equations M and [TU] are referred to as singletons. The subset lower and upper ap- 
proximations of incompletely specified data sets are then defined as: 



U{K B (x)\x e U,K B {x) CX} 



(11) 



and 



u {K B (x)\x e U, k B (x) n x ^ 0} 



(12) 



More information on these methods can be found in ( 


Grzvmala- 


3usse 


2004 


2001; 


Grzvmala-Busse . 


1992; 


Grzvmala-Busse and Siddhave. 


2004 


)■ 



Grzvmala-Busse and Hu 



It follows from the properties that a crisp set is only defined if B_(X) = B(X). Rough- 
ness therefore is defined as the difference between the upper and the lower approximation. 

3.5.2 Rough Membership Functions 

Rough membership function is a function \i\ : U — > [0, 1] that when applied to object 
x, quantifies the degree of overlap between set X and the indiscinibility set to which x 
belongs. The rough membership function is used to calculate the plausibility, defined as 



[x} B nx\ 

\[x]b\ 



(13) 



4 Missing Data Imputation Based on Rough Sets 

The algorithm implemented here imputes the missing values by presenting a list of all 
possible values, based on the observed data. As mentioned earlier, the hypothesis here 
is that in most finite databases, a case similar to the missing data case could have been 
observed before. It therefore should be cheaper to use such values, instead of computing 
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missing values with complex methods such as neural networks. The algorithm imple- 
mented is shown in Algorithm [TJ followed by a work-through example demonstrating how 
the missing values are imputed. There are two approaches to reconstructing the missing 
values. The missi ng values can either be probabilistically interpreted or be possibilisti- 



cally interpreted (INakata and Sakail . 



20061 ). 



Algorithm 1: Rough sets based missing data imputation algorithm 
input : Incompete data set A with a attributes and i instances. 

All these instances should belong to a desision T> 
output : A vector containing possible missing values 
Assumption: T> and some attributes will always be known 
forall i do 

— > Partition the input space according to T> — > Arrange all attributes according 
to order of availability, with T> being first, 
end 

foreach attribute do 

— ► Without directly extracting the rules, use the available information to 

extract relationships to other instances i in the A. 

— > The family of equivalent classes e(a) containing each object Oj for all input 
attributes is computed. 

— > The degree of belongingness K,(o[A}l/\dom(a imisain )| where o ^ d and 
dom{xi i ) denotes the domain of attribute xx 4 , which is the forth instance of X\, 
and \dom(x\ A )\ is the cardinality of dom(x\ A ) while extracting relationships do 
if i has the same attribute values with aj everywhere except for the missing 
value, replace the missing value, a missing , with the value Vj, from aj, where 
j is an index to onother instance. 
Otherwise proceed to the next step 
end 

— > Complete the lower approximation of each attribute, given the available data 
of the same instance with the missing value, 
while doing this do 

IF more than one Vj values are suitable for the estimation, postpone the 
replacement for later when it will be clear which value is appropriate 
end 

— ► Compute the incomplete upper approximations of each subset partition. 
— > Do the computation and imputation of missing data as was done with the 
lower approximation. 

— > Either crips sets will be found, otherwise, rough sets can be used and 
missing data can be heuristically be selected from the obtained rough set. 
end 



In our example, the degree of belongingness k(o[xi 4 ] = o[xi 4 ] = l/\dom{x\ i )\ where 
o 7^ d and dom{x\ 4 ) denotes the domain of attribute x% 4 , which is the forth instance 
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of xi, and \dom{x\ 4 )\ is the cardinality of dom(xi 4 ). If the missing values were to be 
possibilistically interpreted, all attributes have the same possibilistic degree of being the 
actual one. 

The algorithm in this study is fully dependent on the available data and makes no 
additional assumptions about the data or the distribution thereof. As presented in the 
algorithm, a list of possible values is given in a case where a crisp set could not be found. 
It is from this list that possible values may be heuristically chosen. A justification to this 
is that it is not always the case that we need to know the exact value. As a result, it 
may be cheaper to have a rough value. The possible imputable values are obtained by 
collecting all the entries that lead to a particular deci sion T>. The algori thms used in this 



Hong et al. 



(120021 ) 



application is a simplified version of the algorithm of 

The algorithm will now be illustrated using an example. Missing values will be denoted 
by the question mark (? ) symbol. Attribute values o f attribute a are denoted as V a . Using 
the notation defined in iGediga and Duntschl (120031 ). we let relg(x) represent a set of all 
Q-relevant attributes of x. Assuming an IT as presented in Table [H where X\ is in binary 
form, X2 £ [1 : 5] and being integers and 23 can either be 0.2, 0.3 or 0.4. 

The algorithms firstly seeks relationship between variables. Since this is a small 
database, it is assumed that the only variable that will always be known is the decision. 
The first step will be to partition the data according to the decision and this could be 
done as follows: 



e{D) = {01, o 3 , o 6 }, {o 2 , 04, o 5 , 07} 

Two partitions are obtained due the binary nature of the decision in the chosen 
example. The next step is to extract indiscernible relationships within each attribute. 
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For xi, the following is obtained: 



IND(xi) = {(oi, oi), (oi, o 2 ), (oi, o 4 ), (01, o 7 ), (o 2 , o 2 ), (o 2 , o 4 ), (02, 07), 
(o 3 , o 3 ), (o 3 , o 4 ), (o 3 , o 5 ), (o 3 , o 6 ), (o 4 , o 4 ), (o 4 , o 5 ), (o 4 , o 6 )(o 4 , o 7 ), 

(o 5 , o 5 ), (o 5 , o 6 ), (o 6 , o 6 ), (07, 7 )} 

The family of equivalent classes e(xi) containing each object Oj for all input variables 
is computed as follows: 

£(xi) = {Oi, 2 , 4 , O7}, {03, O4O5, 6 } 

Similarly, 

£ (x 2 ) = {01, o 3 , o 4 }, {o 2 , o 4 , o 6 }, {o 4 , o 5 }, {o o 7 }, {o 4 }{0 7 } 

and 

S(x 3 ) = {Oi, 6 , O7}, {o 2 , 3 , 4 , O7}, {o 5 , O7} 

In our example, the degree of belongingness «(o[xi 4 ] = o[xi 4 ] = l/\dom(xi 4 )\ where 
0^0' and dom(xi 4 ) denotes the domain of attribute x± 4 , which is the fourth instance 
of x±, and \dom{x l4 )\ is the cardinality of dom{x\ 4 ). If the missing values were to be 
possibilistically interpreted, each attribute has the same possibilistic degree of being the 
actual one. The lower approximations is defined as: 

A(X miss , {X ava u,V}) = {E(X miss )\3(X ava u,V), E{X) C (X ava a,V)} (14) 
whereas the upper approximation is defined as 
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A(X m i ss , {X ava ii,T>}) — {E(X m ig 8 )\B(X ava ii,V),E(X) C\X avai i HV} (15) 
Using IND(xi), the families of all possible classes containing 04 are given by 



Posse{x 1 ) 0x = {01, o 2 , o 7 }, {01, o 2 , o 4 , o 7 }, z = 1, 2, 7 
Posse(xi) 0i = {o 3 , o 5 , o 6 }, {o 3 , o 4 , o 5 , o 6 }, z = 3, 5, 6 
Posse(x 1 ) 04 = {o 4 , ox, o 2 , o 7 }, {o 3 , o 4 , o 5 , o 6 } 



The probabi 



istic degree to wh i ch we can be sure that the chosen value is the right 



one is given by (INakata and Sakail . 



2006|) 



/c(({oi}) ££(n)) = 1/2,1 = 1,2,7 
«(({oi}) ee(n)) = 1/2,1 = 3,5,6 
/c(({ 0i })ee(a;i)) = l/2,z = 4 

else 

k({o,}) G e(xi)) = 

The else part applies to all other conditions such as re({oi, o 2 , o 3 }) G = 0. A 

family of weighted equivalent classes is now computed as follows: 

£ i x i) = {{°i> oz, o 4 , o 7 }{l/2}}, {{o 3 , 04O5, o 6 }{l/2}} 



The values £(^2) and e{x^) are computed in a similar way. We then use these families 
of weighted equivalent classes to obtain the lower and upper approximations as presented 
above. The degree to which the object o has the same value as object d on the attributes 
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is referred to as the de gree of belongingness and is denned in terms of the binary relation 
for indiscernibility as (INakata and Sakail . 120061 ) : 



IND(X) = {((o,o'),k(o[X] 



of[X]))\{K{o[X] 



o'[X}) 



^0)A(o^o')}U{((o,o),l)} 



where k(o[X] = o'[X]) is the indiscernibility degree of the objects o and d and this is 
equal to the degree of belongingness, 



where the operator ® depends on whether the missing values are possibilistically or 
probabilistically interpreted. For probabilistic interpretation, the parameter is a product 
denoted by x , otherwise the operator min is used. 

5 Experimentatal Evaluation 

5.1 Database 

The data used in this test was obtained from the South African antenatal sero-prevalence 
survey of 2001. The data for this survey is obtained from questionnaires answered by 
pregnant women visiting selected public clinics in South Africa. Only women participat- 
ing for the first time in the survey were eligible to answer the questionnaire. 

Data attributes used in this study are the HIV status, education level, gravidity, 
parity, age, age of the father, race and region . The HIV status is the decision and is 
represented in a binary form, where and 1 represent negative and positive respectively. 
Race is measured on the scale 1 to 4 where 1, 2, 3, and 4 represent African, Coloured, 



k{o[X}=o'\X])= a % x n(o[A t ] 
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White and Asian, respectively. The data used was obtained in three regions and are 
referred to as region A, B and C in this investigation. The education level was measured 
using integers representing the highest grade successfully completed, with 13 representing 
tertiary education. Gravidity is the number of pregnancies, complete or incomplete, 
experienced by a female, and this variable is represented by an integer between and 
11. Parity is the number of times the individual has given birth and multiple births are 
counted as one. Both parity and gravidity are important, as they show the reproductive 
activity as well as the reproductive health state of the women. Age gap is a measure of 
the age difference between the pregnant woman and the prospective father of the child. 
A sample of this data set is shown in Table [21 



Table 2: Extract of the HIV database used, with missing values 



Race 


Region 


Educ 


Gravid 


Parity 


Age 


Father's age 


HIV 


1 


C 


? 


1 


2 


35 


41 





2 


B 


13 


1 





20 


22 





3 


? 


10 


2 





? 


27 


1 


2 


C 


12 


1 


? 


20 


33 


1 


3 


B 


9 


? 


2 


25 


28 





? 


C 


9 


2 


1 


26 


27 





2 


A 


7 


1 





15 


? 





1 


C 


? 


4 


? 


25 


28 





4 


A 


7 


1 





15 


29 


1 


1 


B 


11 


1 





20 


22 


1 



5.2 Data Preprocessing 

As mentioned in a previous section, the HIV/AIDS data that is used in this work is 
obtained from a survey performed on pregnant women. Like all data in raw form, there 
are several steps that need to be taken in order to ensure the data is in usable form. 
There are several types of outliers that have been identified in the data. Firstly, some 
of the data records were not complete. This is probably due to the fact that the people 
being surveyed omitted certain information and also errors made by the person who 
manually recorded the surveys onto a spreadsheet. The outliers were from incorrectly 

14 



entered variables. For instance Gravidity is defined as the number of times a woman 
has been pregnant and parity is described as the number of times a woman has given 
birth. Any instance whereby the value of parity is greater than that of parity, the whole 
observation was considered an outlier and was removed. The justification to this is that 
it is not possible for a woman to give birth more than she has been pregnant. 

5.3 Variable Discretisation 

The discretisation defines the granularity with which we would like to analyse the universe 
of discourse. If one chooses to discretise the variables into a large number of categories 
the rules extracted are more complex to analyse. Therefore, if one would like to use 
rough sets for rule analysis and interpretation rather than for classification it is advisable 
that the number of categories be as small as possible. For the purposes of this work the 
input variables have been discretised into four categories. A description of the categories 
and their definition is shown in Table [3j Table H] shows the simplified version of the 
information system shown in Table [21 



Table 3: A table showing the discretised variables. 



Race 


Age 


Education 


Gravidity 


Parity 


Father's Age 


HIV 


1 


< 19 


Zero (0) 


Low (< 3) 


Low (< 3) 


< 19 





2 


[20 - 29]) 


P (1 - 7) 


High (> 3) 


High (> 3) 


([20 - 29]) 


1 


3 


[30 - 39]) 


S (8 - 12) 






([30 -39]) 




4 


> 40 


T (13) 






- > 40 





5.4 Results and Discussion 

The experimentation was performed using both the original and the simplified data sets. 
Results obtained in both cases are summarised in Table [5] 

It can be seen that the prediction accuracy is much higher for the generalised data 
set. This is because the states have been reduced. Furthermore, instead of being exact, 
the likelihood of being correct is even higher if one has to give a rough estimate. For 
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Table 4: Extract of the HIV database used, with missing values after discretisation 



Race 


Region 


Educ 


Gravid 


Parity 


Age 


Father's age 


HIV 


1 


C 


? 


< 3 


< 3 


[31:40] 


[41:50] 





2 


B 


T 


< 3 


< 3 


< 20 


[21:30] 





3 


? 


S 


< 3 


< 3 


? 


[21:30] 


1 


2 


C 


S 


< 3 


? 


< 20 


[31:40] 


1 


3 


B 


s 


? 


< 3 


[21:30] 


[21:30] 





? 


C 


s 


< 3 


< 3 


[21:30] 


[21:30] 





2 


A 


p 


< 3 


< 3 


< 20 


? 





1 


C 


? 


> 3 


? 


[21:30] 


[21:30] 





4 


A 


p 


< 3 


< 3 


< 20 


[21:30] 


1 


1 


B 


s 


< 3 


< 3 


< 20 


[21:30] 


1 



Table 5: Missing data estimation results for both the original data and the generalised 
data 





Education 


Gravidity 


Parity 


Father's age 


Original 


83.1 


86.5 


87.8 


74.7 


Generalised 


99.3 


99.2 


99 


98.5 



instance, instead of saying that someone has a highest level of education of 10, it is much 
safer to say, They have secondary education. Although this approach leaves details, it is 
often the case that the left-out details are not required. In a decision system such as the 
one considered in this chapter, knowing that the prospective father is 19 years old may 
carry the same weight as saying that the father is a teenager. 



6 Conclusion 

Rough sets have been used for missing data imputation and characteristic relations are 
introduced to describe incompletely specified decision tables. It has been shown that the 
basic rough set idea of lower and upper approximations for incompletely specified decision 
tables may be defined in a variety of different ways. The technique was tested with a real 
database and the results with the HIV database are acceptable with accuracies ranging 
from 74.7% to 100%. One drawback of this method is that it makes no extrapolation or 
interpolation and as a result, can only be used if the missing case is similar or related to 
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another case with full or more observation. 
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